121 research outputs found
Event detection in location-based social networks
With the advent of social networks and the rise of mobile technologies, users have become ubiquitous sensors capable of monitoring various real-world events in a crowd-sourced manner. Location-based social networks have proven to be faster than traditional media channels in reporting and geo-locating breaking news, i.e. Osama Bin Laden’s death was first confirmed on Twitter even before the announcement from the communication department at the White House. However, the deluge of user-generated data on these networks requires intelligent systems capable of identifying and characterizing such events in a comprehensive manner. The data mining community coined the term, event detection , to refer to the task of uncovering emerging patterns in data streams . Nonetheless, most data mining techniques do not reproduce the underlying data generation process, hampering to self-adapt in fast-changing scenarios. Because of this, we propose a probabilistic machine learning approach to event detection which explicitly models the data generation process and enables reasoning about the discovered events. With the aim to set forth the differences between both approaches, we present two techniques for the problem of event detection in Twitter : a data mining technique called Tweet-SCAN and a machine learning technique called Warble. We assess and compare both techniques in a dataset of tweets geo-located in the city of Barcelona during its annual festivities. Last but not least, we present the algorithmic changes and data processing frameworks to scale up the proposed techniques to big data workloads.This work is partially supported by Obra Social “la Caixa”, by the Spanish Ministry of Science and Innovation under contract (TIN2015-65316), by the Severo Ochoa Program (SEV2015-0493), by SGR programs of the Catalan Government (2014-SGR-1051, 2014-SGR-118), Collectiveware (TIN2015-66863-C2-1-R) and BSC/UPC NVIDIA GPU Center of Excellence.We would also like to thank the reviewers for their constructive feedback.Peer ReviewedPostprint (author's final draft
Dual Stochastic Natural Gradient Descent
Although theoretically appealing, Stochastic Natural Gradient Descent (SNGD)
is computationally expensive, it has been shown to be highly sensitive to the
learning rate, and it is not guaranteed to be convergent. Convergent Stochastic
Natural Gradient Descent (CSNGD) aims at solving the last two problems.
However, the computational expense of CSNGD is still unacceptable when the
number of parameters is large. In this paper we introduce the Dual Stochastic
Natural Gradient Descent (DSNGD) where we take benefit of dually flat manifolds
to obtain a robust alternative to SNGD which is also computationally feasible.Comment: 16 page
Scaling DBSCAN-like algorithms for event detection systems in Twitter
The increasing use of mobile social networks has lately transformed news media. Real-world events are nowadays reported in social networks much faster than in traditional channels. As a result, the autonomous detection of events from networks like Twitter has gained lot of interest in both research and media groups. DBSCAN-like algorithms constitute a well-known clustering approach to retrospective event detection. However, scaling such algorithms to geographically large regions and temporarily long periods present two major shortcomings. First, detecting real-world events from the vast amount of tweets cannot be performed anymore in a single machine. Second, the tweeting activity varies a lot within these broad space-time regions limiting the use of global parameters. Against this background, we propose to scale DBSCAN-like event detection techniques by parallelizing and distributing them through a novel density-aware MapReduce scheme. The proposed scheme partitions tweet data as per its spatial and temporal features and tailors local DBSCAN parameters to local tweet densities. We implement the scheme in Apache Spark and evaluate its performance in a dataset composed of geo-located tweets in the Iberian peninsula during the course of several football matches. The results pointed out to the benefits of our proposal against other state-of-the-art techniques in terms of speed-up and detection accuracy.Peer ReviewedPostprint (author's final draft
Robust Bayesian Linear Classifier Ensembles
The original publication is available at
http://www.springerlink.comEnsemble classifiers combine the classification results of several classifiers.
Simple ensemble methods such as uniform averaging over a set of models
usually provide an improvement over selecting the single best model. Usually probabilistic
classifiers restrict the set of possible models that can be learnt in order to
lower computational complexity costs. In these restricted spaces, where incorrect
modelling assumptions are possibly made, uniform averaging sometimes performs
even better than bayesian model averaging. Linear mixtures over sets of models provide
an space that includes uniform averaging as a particular case. We develop two
algorithms for learning maximum a posteriori weights for linear mixtures, based on
expectation maximization and on constrained optimization. We provide a nontrivial
example of the utility of these two algorithms by applying them for one dependence
estimators.We develop the conjugate distribution for one dependence estimators and
empirically show that uniform averaging is clearly superior to BMA for this family
of models. After that we empirically show that the maximum a posteriori linear mixture
weights improve accuracy significantly over uniform aggregation.Peer reviewe
A Robust Solution to Variational Importance Sampling of Minimum Variance
Importance sampling is a Monte Carlo method where samples are obtained from an alternative proposal distribution. This can be used to focus the sampling process in the relevant parts of space, thus reducing the variance. Selecting the proposal that leads to the minimum variance can be formulated as an optimization problem and solved, for instance, by the use of a variational approach. Variational inference selects, from a given family, the distribution which minimizes the divergence to the distribution of interest. The Rényi projection of order 2 leads to the importance sampling estimator of minimum variance, but its computation is very costly. In this study with discrete distributions that factorize over probabilistic graphical models, we propose and evaluate an approximate projection method onto fully factored distributions. As a result of our evaluation it becomes apparent that a proposal distribution mixing the information projection with the approximate Rényi projection of order 2 could be interesting from a practical perspective
A Survey on Sensor Networks from a Multiagent Perspective
Sensor networks (SNs) have arisen as one of the most promising technologies for the next decades. The recent emergence of small and inexpensive sensors based upon microelectromechanical systems ease the development and proliferation of this kind of networks in a wide range of actual-world applications. Multiagent systems (MAS) have been identified as one of the most suitable technologies to contribute to the deployment of SNs that exhibit flexibility, robustness and autonomy. The purpose of this survey is 2-fold. On the one hand, we review the most relevant contributions of agent technologies to this emerging application domain. On the other hand, we identify the challenges that researchers must address to establish MAS as the key enabling technology for SNs.This work has been funded by projects IEA(TIN2006-15662-C02-01), Agreement Technologies (CONSOLIDER CSD2007-0022, INGENIO 2010), EVE (TIN2009-14702-C02-01,TIN2009-14702-C02-02) and Generalitat de Catalunya under the gran
t2009-SGR-1434. Meritxell Vinyals is supported by the Spanish Ministry of Education (FPU grant AP2006-04636)Peer Reviewe
Algorithms for Graph-Constrained Coalition Formation in the Real World
Coalition formation typically involves the coming together of multiple,
heterogeneous, agents to achieve both their individual and collective goals. In
this paper, we focus on a special case of coalition formation known as
Graph-Constrained Coalition Formation (GCCF) whereby a network connecting the
agents constrains the formation of coalitions. We focus on this type of problem
given that in many real-world applications, agents may be connected by a
communication network or only trust certain peers in their social network. We
propose a novel representation of this problem based on the concept of edge
contraction, which allows us to model the search space induced by the GCCF
problem as a rooted tree. Then, we propose an anytime solution algorithm
(CFSS), which is particularly efficient when applied to a general class of
characteristic functions called functions. Moreover, we show how CFSS can
be efficiently parallelised to solve GCCF using a non-redundant partition of
the search space. We benchmark CFSS on both synthetic and realistic scenarios,
using a real-world dataset consisting of the energy consumption of a large
number of households in the UK. Our results show that, in the best case, the
serial version of CFSS is 4 orders of magnitude faster than the state of the
art, while the parallel version is 9.44 times faster than the serial version on
a 12-core machine. Moreover, CFSS is the first approach to provide anytime
approximate solutions with quality guarantees for very large systems of agents
(i.e., with more than 2700 agents).Comment: Accepted for publication, cite as "in press
A tutorial on optimization for multi-agent systems
Research on optimization in multi-agent systems (MASs) has contributed with a wealth of techniques to solve many of the challenges arising in a wide range of multi-agent application domains. Multi-agent optimization focuses on casting MAS problems into optimization problems. The solving of those problems could possibly involve the active participation of the agents in a MAS. Research on multi-agent optimization has rapidly become a very technical, specialized field. Moreover, the contributions to the field in the literature are largely scattered. These two factors dramatically hinder access to a basic, general view of the foundations of the field. This tutorial is intended to ease such access by providing a gentle introduction to fundamental concepts and techniques on multi-agent optimization. © 2013 The Author.Peer Reviewe
Constructing a unifying theory of dynamic programming DCOP algorithms via the generalized distributive law
In this paper we propose a novel message-passing algorithm, the so-called Action-GDL, as an extension to the generalized distributive law (GDL) to ef¿ciently solve DCOPs. Action-GDL provides a unifying perspective of several dynamic programming DCOP algorithms that are based on GDL, such as DPOP and DCPOP algorithms. We empirically show how Action-GDL using a novel distributed post-processing heuristic can outperform DCPOP, and by extension DPOP, even when the latter uses the best arrangement provided by multiple state-of-the-art heuristics.Work funded by IEA (TIN2006-15662-C02-01), AT (CONSOLIDER CSD2007-0022, INGENIO 2010) and EVE (TIN2009-14702-C02-01 and 02). Vinyals is supported by the Spanish Ministry of Education (FPU grant AP2006-04636)Peer Reviewe
- …